Goto

Collaborating Authors

 k-nn regression


Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression

arXiv.org Machine Learning

We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.


such that, given a training size n, the regressor converges at a rate no better than n

Neural Information Processing Systems

Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure.


Twin Neural Network Improved k-Nearest Neighbor Regression

arXiv.org Artificial Intelligence

Twin neural network regression is trained to predict differences between regression targets rather than the targets themselves. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points. Choosing the anchors to be the nearest neighbors of the unknown data point leads to a neural network-based improvement of k-nearest neighbor regression. This algorithm is shown to outperform both neural networks and k-nearest neighbor regression on small to medium-sized data sets.


Syrupy Mouthfeel and Hints of Chocolate -- Predicting Coffee Review Scores using Text Based Sentiment

arXiv.org Artificial Intelligence

This paper uses textual data contained in certified (q-graded) coffee reviews to predict corresponding scores on a scale from 0-100. By transforming this highly specialized and standardized textual data in a predictor space, we construct regression models which accurately capture the patterns in corresponding coffee bean scores.


Quiet log noise with Python and machine learning

#artificialintelligence

Continuous integration (CI) jobs can generate massive volumes of data. When a job fails, figuring out what went wrong can be a tedious process that involves investigating logs to discover the root cause--which is often found in a fraction of the total job output. To make it easier to separate the most relevant data from the rest, the Logreduce machine learning model is trained using previous successful job runs to extract anomalies from failed runs' logs. This principle can also be applied to other use cases, for example, extracting anomalies from Journald or other systemwide regular log files. A typical log file contains many nominal events ("baselines") along with a few exceptions that are relevant to the developer.


Nonparametric Stochastic Contextual Bandits

AAAI Conferences

We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions.We attain tight results for top-arm identification and a sublinear regret of Õ ( T 1+ D / (2+ D ), where D is the context dimension, for a modified UCB algorithm that is simple to implement. We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing approaches for both simulated tasks and MNIST image classification.


Nonparametric Stochastic Contextual Bandits

arXiv.org Machine Learning

We analyze the $K$-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of $\widetilde{O}\Big(T^{\frac{1+D}{2+D}}\Big)$, where $D$ is the context dimension, for a modified UCB algorithm that is simple to implement ($k$NN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.


Rates of Uniform Consistency for k-NN Regression

arXiv.org Machine Learning

We derive high-probability finite-sample uniform rates of consistency for $k$-NN regression that are optimal up to logarithmic factors under mild assumptions. We moreover show that $k$-NN regression adapts to an unknown lower intrinsic dimension automatically. We then apply the $k$-NN regression rates to establish new results about estimating the level sets and global maxima of a function from noisy observations.


Bayesian Kernel and Mutual $k$-Nearest Neighbor Regression

arXiv.org Machine Learning

We propose Bayesian extensions of two nonparametric regression methods which are kernel and mutual $k$-nearest neighbor regression methods. Derived based on Gaussian process models for regression, the extensions provide distributions for target value estimates and the framework to select the hyperparameters. It is shown that both the proposed methods asymptotically converge to kernel and mutual $k$-nearest neighbor regression methods, respectively. The simulation results show that the proposed methods can select proper hyperparameters and are better than or comparable to the former methods for an artificial data set and a real world data set.


k-NN Regression Adapts to Local Intrinsic Dimension

Neural Information Processing Systems

Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that $k$-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query $x$ and depend only on the way masses of balls centered at $x$ vary with radius. Furthermore, we show a simple way to choose $k = k(x)$ locally at any $x$ so as to nearly achieve the minimax rate at $x$ in terms of the unknown intrinsic dimension in the vicinity of $x$. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure.